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phenix.refine is a program within the PHENIX package that 
supports crystallographic structure refinement against experi- 
mental data with a wide range of upper resolution limits using 
a large repertoire of model parameterizations. It has several 
automation features and is also highly flexible. Several 
hundred parameters enable extensive customizations for 
complex use cases. Multiple user-defined refinement strategies 
can be applied to specific parts of the model in a single 
refinement run. An intuitive graphical user interface is 
available to guide novice users and to assist advanced users 
in managing refinement projects. X-ray or neutron diffraction 
data can be used separately or jointly in refinement. 
phenix.refine is tightly integrated into the PHENIX suite, 
where it serves as a critical component in automated model 
building, final structure refinement, structure validation and 
deposition to the wwPDB. This paper presents an overview 
of the major phenix.refine features, with extensive literature 
references for readers interested in more detailed discussions 
of the methods. 

1. Introduction 

Crystallographic structure refinement is a complex procedure 
that combines a large number of very diverse steps, where 
each step may be very complex itself. Each refinement run 
requires selection of a model parameterization, a refinement 
target and an optimization method. These decisions are often 
dictated by the experimental data quality (completeness and 
resolution) and the current model quality (how complete the 
model is and the level of error in the atomic parameters). The 
diversity of data qualities (from ultrahigh to very low resolu- 
tion) and model qualities (from crude molecular-replacement 
results to well refined near-final structures) generates the 
need for a large variety of possible model parameterizations, 
refinement targets and optimization methods. 

Model parameters are variables used to describe the crystal 
content and its properties. Model parameters can be broken 
down into two categories: (i) those that describe the atomic 
model (atomic model parameters), such as atomic coordinates, 
atomic displacement parameters (ADPs), atomic occupancies 
and anomalous scattering terms (f and /"), and (ii) non- 
atomic model parameters that describe bulk solvent, twinning, 
crystal anisotropy and so on. The parameters that describe the 
crystal are combined and expressed through the total model 
structure factors F mode i, which are expected to match the 
corresponding observed values F obs and other experimentally 
derived data (e.g. experimental phase information). 

A refinement target is a mathematical function that 
quantifies the fit of the model parameters (expressed 
through F model ) and the experimental data (amplitudes, F obs , 
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or intensities, 7 obs , and experimental phases if available). 
Typically, target functions are denned such that their value 
decreases as the model improves. This in turn formulates the 
goal of a crystallographic structure refinement as an optimi- 
zation problem in which the model parameters are modified in 
order to achieve the lowest possible value of the target func- 
tion or, in other words, minimization of the refinement target. 

Algorithms to optimize the refinement target range from 
gradient-driven minimization, simulated-annealing-based 
methods and grid searches to interactive model building in a 
graphical environment. These methods vary in speed, scal- 
ability, convergence radius and applicability to current model 
parameters. The type of parameters to be optimized, the 
number of refinable parameters and the current model 
quality may all dictate the choice of optimization (target- 
minimization) method. 

Below, we describe how crystallographic structure refine- 
ment is implemented in phenix.refine. 

2. Methods 

Crystallographic structure refinement can be performed in 
PHENIX (Adams et al, 2002, 2010) using X-ray data, neutron 
data or both types of data simultaneously. Highly customized 
refinement strategies are available for a broad range of 
experimental data resolutions from ultrahigh resolution, 
where an interatomic scatterer (IAS) model can be used to 
model bonding features (Afonine et al, 2004, 2007), to low 
resolution, where the use of torsion-angle parameterization 
(Rice & Briinger, 1994; Grosse-Kunstleve et al, 2009) and 
specific restraints for coordinates [reference -model, secondary- 
structure, noncrystallographic symmetry (NCS) and Rama- 
chandran plot restraints] may be essential (Headd et al, 2012). 
A highly optimized automatic rigid-body refinement protocol 
(Afonine et al, 2009) is available to facilitate initial stages of 
refinement when the starting model may contain large errors 
or as the only option at very low resolution. Most refinement 
strategies can be combined with each other and applied to any 
selected part of the structure. Specific tools are available for 
refinement using neutron data, such as automatic detection, 
building and refinement of exchangeable H/D sites and 
difference electron-density map-based building of D atoms 
for water molecules (Afonine, Mustyakimov et al, 2010). Most 
of the refinement strategies available for refinement against 
X-ray data are also available for refinement using neutron 
data. Refinement of individual coordinates can be performed 
in real or reciprocal space or consecutively in both (dual-space 
refinement). Refinement against data collected from twinned 
crystals is also possible. 

The high degree of flexibility and extensive functionality 
of phenix.refine has been made possible by modern software- 
development approaches. These approaches include the use of 
object-oriented languages, where the convenience of scripting 
and ease of use in Python are augmented by the speed of C++, 
and by a library-based development approach, where each of 
the major building blocks is implemented as a reusable set of 
modules. Most of the modules are available through the open- 



source CCTBX libraries (Grosse-Kunstleve & Adams, 2002; 
Grosse-Kunstleve et al, 2002). An overview of the underlying 
open-source libraries can be found in a series of recent IV Cr 
Computing Commission Newsletter articles (issues 1-8; http:// 
www.iucr.org/iucr-top/comm/ccom/newsletters/). 

The refinement protocol implemented in phenix.refine 
(Afonine et al, 20056) consists of three main parts. 

Initialization: includes processing of input data and the job- 
control parameters, analysis and refinement-strategy selection 
and a number of consistency checks. 

Macro-cycle: the main body of refinement, a repeatable 
block where the actual model refinement occurs. 

Output: the concluding step where the refined model, 
electron-density maps and many statistics are reported. 

The following sections outline the key steps of structure 
refinement in phenix.refine. 

2.1. Initial step of refinement: processing of inputs 

To initiate refinement, a number of major sources of 
information have to be processed. 

(i) Structural model: coordinates, displacement parameters, 
occupancies, atom types, /' and /" for anomalous scatterers (if 
present). 

(ii) Reflection data: pre-processed observed intensities or 
amplitudes of structure factors and, optionally, experimental 
phases. 

(iii) Parameters determining the refinement protocol. 

(iv) Empirical geometry restraints: bond lengths, bond 
angles, dihedral angles, chiralities and planarities (Engh & 
Huber, 1991; Grosse-Kunstleve, Afonine et al, 2004). 

(v) Optionally, a restraint library file (CIF) may be provided 
to define the stereochemistry of entities in the input model 
(for example, ligands) that do not have corresponding 
restraints in the library included in the PHENIX distribution. 

The user provides the structural model and reflection data. 
The refinement software then retrieves default parameters 
and information from a library of empirical geometry 
restraints, which can be readily customized by the user. 

The PDB format (Bernstein et al. ,1977; Berman et al. , 2000) 
is the most commonly used format for exchanging macro- 
molecular model data and is therefore available as the input 
format for refinement in PHENIX. The iotbx.pdb library 
module (Grosse-Kunstleve & Adams, 2010) performs the first 
stage of the PDB-file interpretation. It robustly constructs an 
internal hierarchy of models (PDB MODEL keyword), chains, 
conformers (PDB altLoc identifier), residues and atoms. 
Common simple formatting problems are corrected on the fly 
where possible. Currently, phenix.refine can only make use of 
PDB files containing a single model. The second stage of the 
PDB interpretation involves matching the structural data 
with definitions in the CCP4 Monomer Library (Vagin & 
Murshudov, 2004; Vagin et al, 2004) in order to derive 
geometry restraints, scattering types and nonbonded energy 
types. Many common simple formatting and naming problems 
are considered in this interpretation. The PDB interpretation 
(iotbx.pdb) has been tested with all files found in the 
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PDB database (http://www.pdb.org/) as of August, 2011 and 
supports both PDB version 2.3 and version 3.x atom-naming 
conventions. The vast majority of files can be processed 
without user intervention. Detailed diagnostic messages help 
the user to quickly identify idiosyncrasies in the PDB file that 
cannot be automatically corrected. If the input PDB file 
contains an item undefined in the CCP4 Monomer Library, a 
geometry restraint (CIF) file must be provided for that item. 
This file can be obtained by running phenix. elbow (Moriarty et 
al, 2009) or phenix. ready _set, which is more comprehensive 
and automated. 

The experimental data can be provided in many commonly 
used formats. Multiple input files can be given simultaneously, 
e.g. a SCALEPACK file (Otwinowski & Minor, 1997) with 
observed intensities, a CNS (Briinger et al, 1998) file with R bee 
flags (Briinger, 1992, 1993) and an MTZ file (Winn et al, 2011) 
with phase information. A comprehensive procedure aims to 
extract the data most suitable for refinement without user 
intervention. A preliminary crystallographic data analysis is 
performed in order to detect and ignore potential reflection 
outliers (Read, 1999). If twinning (for a review, see 
Parsons, 2003; Helliwell, 2008) is suspected, a user can run 
phenix. xtriage (Zwart et al, 2005) to obtain a twin-law 
operator to be used by the twin-refinement target in 
phenix. refine. 

A number of automatic adjustments to the refinement 
strategy are considered at this point. These adjustments 
include automatic choice of refinement target if necessary 
(based on the number of test reflections, the presence of 
twinning and the availability of experimental phase informa- 
tion as Hendrickson-Lattman coefficients; Hendrickson & 
Lattman, 1970), specifying the atomic displacement para- 
meters (isotropic or anisotropic), determining whether or not 
to add ordered solvent (if the resolution is sufficient), auto- 
matic detection or adjustment of user-provided NCS selec- 
tions, determining the set of atoms that should have their 
occupancies refined and automatic determination of occu- 
pancy constraints for atoms in alternative conformations. 
When joint refinement is performed using both X-ray and 
neutron data (Coppens et al, 1981; Wlodawer & Hendrickson, 
1981, 1982; Adams et al, 2009; Afonine, Mustyakimov et al, 
2010), it is important to ensure that the cross-validation 
reflections are consistent between data sets. This check 
is performed automatically. If a mismatch is detected, 
phenix. refine will terminate and offer to generate a new set of 
flags consistent with both data sets. 

The large set of configurable refinement parameters is 
presented to the user in a novel hierarchical organization, 
libtbx.phil, specifically designed to be user-friendly 
(Grosse-Kunstleve et al, 2005). This is achieved via a simple 
syntax with the option to easily override selected parameters 
from the command line. This parameter-handling framework 
is completely general and can be reused for other purposes 
unrelated to refinement. A comprehensive and intuitive 
graphical user interface (GUI) built around this framework 
is also available, allowing users of all skill levels to use 
phenix. refine. 



2.2. The main body of refinement: the refinement macro- 
cycle 

A refinement protocol typically consists of several steps, in 
which each step aims to optimize specific model parameters 
using dedicated methods. This is because of the following. 

(i) The target function typically has many local minima. The 
objective of refinement is to approach the deepest minimum as 
closely as possible. A gradient-driven minimization can reach 
only the nearest local minimum; therefore, sophisticated 
search algorithms such as rotamer optimization (recently 
implemented in phenix.refine) or simulated annealing 
(Briinger et al, 1987; Adams et al, 1997; Brunger et al, 2001; 
Brunger & Adams, 2002) may need to be applied. 

(ii) Some groups of model parameters are highly correlated, 
e.g. isotropic displacement parameters and the exponential 
component of the overall scale -factor correction, ADPs 
and occupancies (Cheetham et al, 1992), rigid-body ADPs 
modeled through TLS (for a review, see Urzhumtsev et al, 
2011), local atomic vibrations, and anisotropic scale and bulk- 
solvent parameters, k sol and 5 sol (Tronrud, 1997; Fokine & 
Urzhumtsev, 2002). 

(iii) Different minimization methods imply different 
convergence radii for different model parameters (such as, 
for example, coordinates and ADPs) or for the same kind of 
parameters that have a large spread in magnitude (Agarwal, 
1978; Tronrud, 1994). 

(iv) As the model improves during refinement, a different 
model parameterization may be more appropriate. If addi- 
tional model features become visible in the difference maps, 
such as new water molecules or ions, they may need to be 
reflected by additions or changes to the model. Further, 
erroneously modeled waters from earlier steps may need to 
be removed after a few macro-cycles since their ADPs and/or 
distances to other molecules may refine to implausible values. 

The refinement protocol therefore consists of multiple steps 
repeated iteratively, in which each step is specifically tailored 
to the refinement of particular parameters. The required 
number of such steps depends on the data quality and initial 
model quality. Convergence of the particular refinement run is 
reached if the optimization of the model parameters does not 
lead to a significant improvement in the monitored criteria 
(refinement target function and R factors, for example). This 
section reviews the refinement steps. 

2.2.1. Total model structure factor, bulk-solvent correc- 
tion, scaling and twin-fraction refinement. The total model 
structure factor comprises a number of contributions, 



F model = ^overall exp(-27r 2 h'U cryst h) 



Fcalc + ^sol eX P 



(1) 



where /c overall is an overall scale factor, U cryst is the overall 
anisotropic scale matrix (Sheriff & Hendrickson, 1987; 
Grosse-Kunstleve & Adams, 2002), h is a column vector with 
the Miller indices of a reflection and h' is its transpose, F calc are 
the structure factors computed from the atomic model, k sol 
and fi sol are flat bulk-solvent model parameters (Phillips, 1980; 
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Jiang & Briinger, 1994), s 2 = h'G h, where G* is the reciprocal- 
space metric tensor, and F mask are structure factors calculated 
from a solvent mask (a binary function with zero values in the 
protein region and non-zeros values in the solvent region). 
The mask is computed using memory-efficient exact asym- 
metric units described in Grosse-Kunstleve et al (2011). The 
mask-calculation parameters, /"solvent and ''shrink; can be opti- 
mized in each refinement macro-cycle. 

The structure factors from the atomic model, F calc , are 
computed using either fast Fourier transformation (FFT) or 
direct-summation algorithms (for a review, see Afonine & 
Urzhumtsev, 2004). Various X-ray and neutron scattering 
dictionaries are available (Neutron News, 1992; Maslen et al, 
1992; Waasmaier & Kirfel, 1995; Grosse-Kunstleve, Sauter et 
al, 2004). 

phenix. refine uses a very efficient and robust algorithm for 
finding the best values for A: sol , S sol and U cryst . The details of 
the algorithm, as well as a comprehensive set of references 
to relevant works, have been described previously (Afonine 
et al, 20056). A radial-shell bulk-solvent model (Jiang & 
Briinger, 1994) is also available. In the case of refinement 
against twinned data, the total model structure factor is 
defined as 



,[a|F model (h)| z + (l-a)|F model (Th)| 2 ] 



2-11/2 



(2) 



where a is a twin fraction and is determined by minimizing the 
R factor using a simple grid search in the [0, 0.5] range with a 
step of 0.01 and the matrix T defines the twin operator. 

2.2.2. Ordered solvent (water) modeling. An automated 
protocol for updating the ordered solvent model can be 
applied during the refinement process. If requested by the 
user, waters are updated (added, removed and refined) in each 
macro-cycle as indicated in Fig. 1. Updating the ordered 
solvent model involves the following steps. 

(i) Elimination of waters present in the initial model based 
on user-defined cutoff criteria for ADP, occupancy and inter- 
atomic distances (water-water and macromolecule-water), 
2mF obs — -DF m odei ( see §2-3.1 for details) map values at water 
oxygen centers and map correlation coefficient values 
computed for each water O atom. 

(ii) Location of new peaks in the mF obs — DF modei map, 
followed by filtering of these peaks by their height and 
distance to other atoms. The filtered peaks are treated as new 
water O atoms with isotropic or anisotropic ADPs as specified 
by the user. 

(iii) Depending on the refinement strategy (typically at high 
resolution), occupancies and individual isotropic or aniso- 
tropic ADPs of newly added water molecules can be refined 
prior to the refinement of all other parameters. This step 
is important because the newly placed waters have only 
approximate ADP values (which is usually the average B 
calculated from the structure). If a large number of new waters 
are added at once this may significantly increase the R factors 
at this step and have an impact on convergence of the 
refinement. In our experience, this effect is most pronounced 
for high-resolution data. 



(iv) Unlike macromolecular atoms that are connected to 
each other via geometry restraints, the electron density is 
typically the only term in the target function keeping the O 
atom of a water molecule in place and occasionally it may 
happen that a density peak is insufficiently strong to keep a 
water molecule from drifting away during refinement. 
Therefore, in phenix.refine the water O-atom positions are 
analyzed with respect to the local density peaks and water 
molecules are automatically re-centered if necessary. 

(v) For refinement using neutron or ultrahigh-resolution 
X-ray data, water H or D atoms can be automatically located 
in the mF obs — £>F mo dei map and added to the model. 

2.2.3. Refinement targets and target weights. Model para- 
meters, such as coordinates and ADPs, are not refined 
simultaneously but at separate steps (see §2.2 for details). 
phenix.refine uses the following refinement target function for 
restrained refinement of individual coordinates, 



T = wxc si 



* wxc * T B 



WC ^ F xvz _ res traints ' 



(3) 



A similar function is used in restrained ADP refinement, 



T'adp = wxu scale * wxu * F exp + wu * T^ estt!ints . (4) 

Here, T exp is the crystallographic term that relates the 
experimental data to the model structure factors. It can be 
a least-squares target (LS; for example, as defined in Afonine 
et al, 2005a), an amplitude-based maximum-likelihood target 
(ML; for example, as defined in Afonine et al, 2005a) or a 
phased maximum-likelihood target (MLHL; Pannu et al, 
1998). For refinement of coordinates, r exp can also be defined 
in real space (see below). 

7\yz_restraints and r adp restra int s are restraint terms that intro- 
duce a priori knowledge, thus helping to compensate for 
the insufficient amount of experimental data owing to finite 
resolution or incompleteness of the data set typically observed 
in macromolecular crystallography. Note that the restraint 
terms are not used in certain situations, for example rigid-body 



Input data and model processing 
Refinement strategy selection 



Bulk solvent/anisotropic scaling/twin fraction 






d solvent addition and rem 






Target weight calculation 





Figure 1 

Flowchart of structure refinement as implemented in phenix.refine. The 
execution of some steps is subject to user-defined options. The main 
refinement body (shown with the gray arrow) is called a macro-cycle and 
is repeated several times. See text for details. 
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coordinate refinement, TLS refinement, occupancy refine- 
ment, /'//" refinement or if the data-to-parameter ratio is 
extremely high. In these cases the total refinement target is 
reduced to T exp . 

The weights wxc scale , wxc and wc (or wx« scale , wxu and wu, 
correspondingly) are used to balance the relative contribu- 
tions of experimental and restraints terms. The automatic 
weight-estimation procedure is implemented as described 
in Briinger et al. (1989) and Adams et al. (1997) with some 
variations and is used by default to calculate wxc and wxu. The 
long-term experience of using a similar scheme in CNS and 
PHENIX indicates that it is typically robust and provides a 
good estimate of weights in most cases, especially at medium 
to high resolution. In cases where this procedure fails to 
produce optimal weights, a more time-intensive automatic 
weight-optimization procedure may be used, as originally 
described by Briinger (1992) and further adopted by Afonine 
et al. (2011), in which an array of wxc scale or wxw scale values is 
systematically tested in order to find the value that minimizes 
i? tree while keeping the overall model geometry deviations 
from ideality within a predefined range. The weight wc (or wu, 
correspondingly) is used to scale the restraints contribution, 
mostly duplicating the function of wxc scale (or wxu scale ), while 
allowing an important unique option of excluding the 
restraints if necessary (for example, at subatomic resolution). 
Setting wc = 0 (or wu = 0) reduces the total refinement target 
to T ew . 

In maximum-likelihood (ML) -based refinement (Pannu & 
Read, 1996; Bricogne & Irwin, 1996; Murshudov et al, 1997; 
Adams et al, 1997; Pannu et al, 1998) the calculation of the 
ML target (Lunin & Urzhumtsev, 1984; Read, 1986, 1990; 
Lunin & Skovoroda, 1995) requires an estimation of model 
error parameters, which depend on the current atomic para- 
meters and bulk-solvent model and scales. Since the atomic 
parameters and the bulk-solvent model are updated during 
refinement, the ML error model has to be updated corre- 
spondingly, as described in Lunin & Skovoroda (1995), 
Urzhumtsev et al. (1996) and Afonine et al. (2005a). 

2.2.4. Refinement of coordinates. Depending on the 
resolution (or more formally the data-to-parameter ratio; 
Urzhumtsev et al, 2009) and initial model quality, there 
are four main options for refinement of coordinates in 
phenix.refine: individual unrestrained (at subatomic resolu- 
tion), individual restrained, constrained rigid-groups (also 
known as torsion-angle) or pure rigid-body refinement. 
Restrained individual coordinate refinement can be 
performed in real and/or reciprocal space. Coordinate 
refinement is performed using L-BFGS minimization (Liu & 
Nocedal, 1989) of the target T xyz (2) with respect to atomic 
positional parameters (individual coordinates or rotation- 
translation parameters of rigid bodies or torsion-angle space 
variables), while keeping all other parameters fixed. Simulated 
annealing (SA) is an alternative option for optimizing the 
target T xyz (2) and is known to be a powerful tool for escaping 
from local minima and therefore increasing the convergence 
radius of refinement (Briinger et al, 1987). This option is 
available and can be used depending on the model and data 



quality, as well as the stage of refinement. SA can be 
performed in Cartesian or torsion-angle space (Grosse- 
Kunstleve et al, 2009). 

A highly optimized protocol for pure rigid-body refinement 
is available (the MZ protocol), in which the refinement begins 
with the lowest resolution zone using a few hundred low- 
resolution reflections and gradually proceeds to higher reso- 
lution by adding an optimal number of high-resolution 
reflections in each step (Afonine et al, 2009). All of the 
parameters of this protocol have been selected to achieve 
the largest convergence radius with a minimal runtime. 
The algorithm does not require a user to truncate the high- 
resolution limits at ad hoc values. 

Real-space refinement (RSR) of coordinates has a long 
history (Diamond, 1971; Deisenhofer et al, 1985; Urzhumtsev, 
Lunin & Vernoslova, 1989; Jones et al, 1991; Oldfield, 2001; 
Chapman, 1995; see also the discussion of and references to 
earlier original works in Murshudov et al, 1997; Korostelev 
et al, 2002). It is complementary to the more routinely used 
structure-factor-based reciprocal-space refinement. RSR 
optimizes the fit of the atoms to the current electron-density 
map. In phenix.refine the map is computed only once per 
macro-cycle. An RSR iteration is therefore typically much 
faster than a reciprocal-space refinement iteration and it is 
significantly more practical to systematically determine the 
optimal RSR relative weighting of T exp and 7 , xyz restraints in (3) 
compared with the reciprocal-space refinement weight opti- 
mization outlined in §2.2.3. The RSR weight determination in 
phenix.refine aims to find the largest weight for T exp that still 
produces reasonable geometry. The current model is refined 
independently multiple times, each time using a different trial 
weight from an empirically determined range. The resulting 
geometry is evaluated by computing the maximum and 
average deviation of the model bond distances from ideal 
bond distances. Typically, the RSR procedure increases the R 
factors (work and free) for well refined structures, but for 
resolutions better than 3 A we often observe important local 
corrections that are beyond the reach of SA (see §3). In such 
cases, subsequent reciprocal-space refinement usually leads to 
lower R factors than before RSR. In cases where the R factors 
increase beyond a user-definable threshold the RSR result is 
automatically discarded. 

2.2.5. Refinement of atomic displacement parameters 
(ADP refinement). An atomic displacement parameter 
(ADP) or B factor is a superposition of a number of nested 
contributions (Dunitz & White, 1973; Prince & Finger, 1973; 
Sheriff & Hendrickson, 1987; Winn et al, 2001) that describe 
relatively small motions (within the validity of harmonic 
approximations), such as the following. 

(i) Local atomic vibration. 

(ii) Motion as part of a rotatable bond. 

(iii) Residue movement as a whole. 

(iv) Domain movement. 

(v) Whole molecule movement. 

(vi) Crystal lattice vibrations. 

This parameterization can be made even more detailed 
(beyond the harmonic approximation; Johnson & Levy, 1974), 
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but in practice most modern refinement programs use an 
approximation that consists of three main components (see, 
for example, Winn et al, 2001), 



Vu 



cryst ~l~ ^group ^local' 



(5) 



where U total is the total atomic ADR 

U cryst is a symmetric 3x3 matrix which models the common 
displacement of the crystal as a whole and some additional 
experimental anisotropic effects (Sheriff & Hendrickson, 
1987; Uson et al, 1999). This contribution is exactly the same 
for all atoms and thus it is possible to treat this effect directly 
while performing overall anisotropic scaling (Afonine et al, 
2005a; see equation 1). U cryst is forced to obey the crystal 
symmetry constraints, phenix.refine reports refined elements 
of the U cryst matrix expressed on a Cartesian basis and uses the 
B cart notation (Grosse-Kunstleve & Adams, 2002). 



U„ 



is used to model the contribution to U tota i arising 



from concerted motions of multiple atoms (group motions). It 
allows the combination of group motion at different levels (for 
example, whole molecule + chain + residue) and the use of 
models of different degrees of sophistication, such as general 
TLS, TLS for a fixed axis (a librational ADP; U LIB ) and a 
simple group isotropic model with one single parameter. In 
its most general form, U group can be U TL s + U Lm + U subgroup , 
where, for example, U T ls would model the motion of the 
whole molecule or a large domain, U subg roup would model the 
displacement of a smaller group such as a chain using a simpler 
one-parameter model and U LIB would model a side-chain 
libration around a torsion bond using a simplified TLS model 
(Dunitz & White, 1973; Stuart & Phillips, 1985; currently, this 
approach is being implemented in phenix.refine). Depending 
on the current model and data quality, some components 
cannot be used: for example, U group may be just U TLS . 

If the TLS model is used then U TLS = T + ALA 1 + AS + S l A l 
with 20 refinable T (translation), L (libration) and S 
(screw-rotation) matrix elements per group (Schomaker & 
Trueblood; 1968). The choice of TLS groups is often subjective 
and may be based on visual inspection of the molecule in an 
attempt to identify distinct and potentially independent 
fragments. A more rigorous and automated approach is 
implemented in the TLSMD algorithm (Painter & Merritt, 
2006a,6). The TLSMD algorithm identifies TLS groups by 
splitting a whole molecule into smaller pieces followed by 
fitting of TLS parameters to the previously refined atomic B 
factors for each piece. Therefore, it is very important that the 
input ADPs for the TLSMD procedure are minimally biased 
by the restraints used in previous refinements and are mean- 
ingful in general (not reset to an arbitrary constant value, for 
example). In PHENIX, TLS groups can be determined fully 
automatically either as part of a refinement run or by using the 
phenix.find_tls_groups tool (Afonine, unpublished work). 

Finally, small (in the harmonic approximation) local atomic 
vibrations, Ui ocal , can be modeled using a less detailed 
isotropic model that uses only one parameter per atom or 
using a more detailed (and accurate) anisotropic para- 
meterization that includes six parameters per atom and 
therefore requires more experimental observations to be 



feasible. To enforce physical correctness of the refined ADPs, 
phenix.refine employs ADP restraints. In case of anisotropic 
ADPs these are simple similarity restraints (Schneider, 1996; 
Sheldrick & Schneider, 1997). For isotropic ADP refinement 
phenix.refine uses sphere ADP restraints first introduced by 
Afonine et al. (20056), 



^atoms 

^adp = X! 

i=\ 



Kt4 



^localj) 



1=1 Aj (^locaU + ^local,y') ? 



(6) 



where N atoms is the total number of atoms in the model, the 
inner sum spans over all M atoms in the sphere of radius 
R around atom i, r tj is the distance between two atoms i and ;', 
Uiocal,, and U l0 c a ij are the corresponding isotropic ADPs and p 
and q are empirical constants. By default, R, p and q are fixed 
at empirically derived values of 5.0 A, 1.69 and 1.03, respec- 
tively, but they can also be changed by the user. The function 
reduces to a simple pair-wise similarity restraints target if 
p = q = 0 and the radius R is set to be approximately equal to 
the upper limit of a typical bond length. 

The implementation of ADP refinement in phenix.refine 
is described in Afonine, Urzhumtsev et al. (2010) and 
Urzhumtsev et al. (2011). 

2.2.6. Occupancy refinement. Atomic occupancies can be 
used to model disorder beyond the harmonic approximation. 
With the default settings, phenix.refine always refines the 
occupancies of atoms in alternative conformations and those 
having partial nonzero occupancies at input (unless instructed 
otherwise by the user). The constraints for the occupancies 
of atoms in alternative conformations are constructed auto- 
matically based on the altLoc identifiers in the input PDB file. 
Also, a user can specify additional constraints on occupancies 
between any selected atoms. One can also perform a group 
occupancy refinement where one occupancy factor is refined 
per selected set of atoms and is constrained between prede- 
fined minimal and maximal values (0 and 1 by default). This 
can be useful for the refinement of partially occupied ligands, 
waters (when H or D are present) or other crystallization- 
solution components (Hendrickson, 1985). In the case of 
refinement of a partially deuterated structure against neutron 
data, the occupancies of exchangeable H/D sites are refined 
automatically and constraints are applied to ensure that the 
sum of related H and D occupancies is 1. phenix.refine does 
not currently build alternative conformations or H/D sites; 
external tools can be used for this, such as phenix.ready _set to 
add H/D atoms or Coot (Emsley & Cowtan, 2004; Emsley et 
al, 2010) to add side chains in alternate conformations. Fig. 2 
shows some typical situations that are addressed automatically 
by phenix.refine. 

2.2.7. Refinement of dispersive and anomalous coefficients 
(/' and f"). Given data with a significant anomalous signal, 
improved refinement results can be obtained by refining the 
coefficients /' and /" of the anomalously scattering atoms 
(usually heavy atoms) and including them in the calculation of 
structure factors. Most commonly there is only one type of 
anomalous scatterer and it is reasonable to assume that the /' 
and /" coefficients are identical for all anomalous scatterers of 
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the same type in the asymmetric unit. In this case the data- 
to-parameter ratio is very high and the refinement of the 
anomalous coefficients is very stable. Often it is possible to 
initiate refinement with /' = 0 and /" = 0. For rare cases, 
phenix.refine also supports refinement of an arbitrary number 
of sets off and /". Initial values may need to be specified in 
these cases. 



2.3. Refinement output 

The following output is generated at the end of each 
phenix.refine run. 

(i) A PDB file with the refined model and a summary of the 
refinement statistics in its header. The file header also contains 
'REMARK 3' formatted records with refinement, model and 
data statistics, making it ready for PDB deposition. 

(ii) A LOG file. A copy of the information that is printed 
to standard out during refinement. It contains the refinement 
statistics reported as the refinement progresses. 

(iii) An MTZ file with four sections: (1) a copy of the input 
data (intensities or amplitudes) with associated error esti- 
mates (ers), R tTee flags (if any) and Hendrickson-Lattman 
coefficients (if any); (2) data used in refinement (F obs and 
corresponding cts); (3) total model structure factors F model and 
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(e) 

Figure 2 

Illustration of typical scenarios for occupancy refinement that phenix.refine handles automatically. 
(a) Residue having several alternative conformations marked with altLoc identifiers (two in this 
example, A and B). It is essential that all conformers have identical chain identifiers and residue 
numbers, while residue names can be different as shown in example (e). All atoms within each 
conformer must have identical occupancies. The sum of occupancies over all conformers is 
constrained to 1. (b) Single atoms with occupancy not equal to 0 or 1. (c) Exchangeable H/D sites 
(used in refinement against neutron data collected from partially deuterated sample), (d) Single- 
residue molecule with identical occupancies for all atoms (but not equal to 1 or 0). A user can 
overwrite this behavior or/and define constraints for any number of selected atoms or groups of 
atoms. 



(4) a number of Fourier map coefficients for the maps that can 
be visualized by the graphical program Coot. The data used in 
refinement may differ from the original input data as (a) the 
user can specify resolution and a cutoffs, (b) phenix.refine 
performs outlier filtering and (c) if the input data are in the 
form of intensities phenix.refine will automatically convert 
them to amplitudes using the French and Wilson algorithm 
(French & Wilson, 1978). 

(iv) A GEO file. This file lists all of the geometry restraints 
used in refinement, making it easy to inspect every restraint 
(type, ideal and current starting values where applicable) 
applied to an atom in question. Optionally, phenix.refine can 
also output a second GEO file that shows the value of each 
geometry restraint after refinement. 

(v) An EFF file that contains all the parameters used in 
refinement run (this includes parameters specified in the 
command line, parameter file and default settings), and a DEF 
file with the parameters for a subsequent run. 

2.3.1. Map calculation and output. In general, phenix.refine 
can output weighted p*mF obs — q*DF mode:] and unweighted 
P*F 0 b s — <7*^modei maps, where p and q can be any user- 
specified numbers. The phases used for computing these maps 
are either taken from the current model or the combination 
of model phases with the experimentally derived phases (if 
available). By default, phenix.refine 
outputs an MTZ file with several sets of 
Fourier map coefficients. 

(i) Two 2mF obs - DF model maps, 
where one is computed using the F obs 
used in refinement and the other is 
computed using manipulated F obs , 
where any missing F obs are 'filled' in 
with DF moda (see below for details). To 
avoid any confusion, this is clearly 
indicated in the output MTZ file with 
map coefficients. 

(ii) A difference mF obs - DF moisl 
map. 

(iii) For anomalous data, if Bijvoet 
mates F obs (+) and F obs (— ) are available, 
phenix.refine automatically outputs 
an anomalous difference map {[F 0 bs(+) 
— F obs (—)]/2i}exp(i(p) computed with 
the model phase cp, where the imaginary 
unit i in the denominator introduces a 
—90° phase shift, (see, for example, 
Roach, 2003). 

The coefficients m and D of likelihood- 
weighted maps (Read, 1986) are 
computed using the test set of reflec- 
tions as described in Lunin & 
Skovoroda (1995) and Urzhumtsev etal. 
(1996). Other map types can also be 
output, such as average kick maps (AK 
maps; Guncar et al, 2000; Turk, 2007; 
Praznikar et al, 2009) and B-i actor 
sharpened maps (see Brunger et al, 
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2009 and references therein) with the sharpening B factors 
determined automatically. 

It is known that data incompleteness, especially systematic 
incompleteness (missing planes or cones of reciprocal space), 
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Figure 3 

The graphical user interface (GUI) for phenix.refine. (a) Configuration tab showing the refinement strategy 
and commonly used restraint and optimization settings. (£>) Display of results, including summary of output 
files, tables and graphs of statistics and links to molecular-graphics software. 



can cause mild to severe map distortions (Lunin, 1988; 
Urzhumtsev, Lunin & Luzyanina, 1989; Lunin & Skovoroda, 
1991; Tronrud, 1996; Lunina et at, 2002; Urzhumtseva & 
Urzhumtsev, 2011). To compensate for data incompleteness, 

phenix.refine will 'fill' in missing 
ol observations with certain calcu- 
lated values to reduce these map 
distortions. However, this proce- 
dure may introduce model bias 
and obviously the less complete 
the data, the higher the risk. By 
default, missing F obs are 'filled' in 
with Z),F m odei [similar to the 
procedure used in the REFMAC 
program (Murshudov et at, 1997, 
2011)], but there are other 
options possible, such as filling 
with (-F obs ), where the F obs are 
averaged out in a resolution bin 
around the missing F obs , filling 
with simply -F model or even filling 
with random numbers generated 
around (F obs ), Based on a limited 
number of tests, all of the above 
'filling' schemes produce similar 
results, indicating the dominance 
of the phases rather than the 
amplitudes of the filled reflec- 
tions. Clearly, this subject needs 
more systematic and thorough 
research (work in progress). 
However, one can effectively use 
both maps simultaneously, using 
the 'filled' map to help overcome 
difficult cases and using the 
unfilled map to confirm that map 
features have not been over- 
interpreted owing to model bias. 
For presentation purposes, it is 
recommended that unfilled maps 
be used so as to minimize any 
chance of misleading the viewer. 

2.4. H atoms in refinement 

H atoms constitute about 50% 
of the atoms in a macromolecular 
structure, playing a crucial role 
in interatomic contacts (see, for 
example, Chen et al, 2010 and 
references therein). H atoms also 
contribute to the atomic X-ray 
scattering (to F mode i). Informa- 
tion about H atoms (both, 
geometry and scattering) should 
therefore be used in refinement. 
In phenix.refine there are a 
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Table 1 

R factors, Ramachandran plot outliers (RO) and MolProbity clashscores (CS; Davis et al, 2007) for selected structures extracted from the PDB 
(published), extracted from PDB_REDO and after refinement using phenix.refine. 

All data cutoffs (resolution, a) were applied as reported in the original works in order to maintain the same reflections used in the calculations. 
Extracted from REMARK records Calculated with PHENIX 
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f Calculated in PHENIX after applying resolution and a cutoffs, as reported in the PDB file header. % R factors as reported on the PDB_REDO web site (Joosten et al, 2009; http:// 
www.cmbi.ru.nl/pdb_redo/); RO and CS computed using PHENIX. § Re -refinement of PDB-deposited structures using phenix.refine. Refinement strategy (model parameterization 
and number of refinement macro-cycles) varies depending on model and data quality. See text for details. *[ PDB or NDB (Berman et al, 1992) code, tf na, value is not available 
either owing to a missing cross-validation set of reflections or the entry is not available in the database. 



number of tools that make handling of H atoms as easy and as 
automatic as possible at all resolutions and using any 
diffraction data source (X-ray, neutron or both simulta- 
neously). A detailed overview of using H atoms in refinement 
can be found in Afonine, Mustyakimov et al. (2010). 

2.5. Specific tools for refinement at subatomic resolution 

At subatomic resolution (see Urzhumtsev et al, 2009 for a 
discussion of this definition), the residual electron-density 
maps begin to show some additional features that are not 
visible at lower resolutions, such as (i) density peaks for H 
atoms (for both macromolecule and water H atoms), (ii) 
electron-density peaks at interatomic bonds owing to bonding 
effects, (iii) lone-pair electrons and (iv) specific densities for 
ring-conjugated systems. The amount of these features visible 
in residual maps is a function of model quality and data 
resolution. 

If a model is refined at ultrahigh resolution and the above 
features are not modeled, this model can be considered to be 
incomplete. It is well known that refining an incomplete model 
can have a negative effect on all model parameters: positional 
and B factors, for example (Lunin et al, 2002; Afonine et al, 
2004). In addition, when refining a structure at such a high 
resolution one usually looks for very fine structural details (for 
example, Dauter et al, 1995, 1997; Vrielink & Sampson, 2003; 
Petrova & Podjarny, 2004), which are often only seen as subtle 
features in residual maps close to the noise level. Completing 
the model is well known to improve the map quality (by 
reducing noise) and this is clearly demonstrated for the case 
of subatomic resolution residual maps (Afonine et al, 2007; 
Volkov et al, 2007). 



phenix.refine possesses a number of tools specifically dedi- 
cated to model completion and refinement at subatomic 
resolution. 

(i) Unrestrained coordinate and ADP refinement. 

(ii) IAS model to address residual bonding density 
(Afonine et al, 2007). 

(iii) Individual or riding model for H atoms. 

(iv) Automatic mF obs — DF mods ,i map-based location and 
optimization of water H atoms. 

(v) Choice between FFT and direct-summation algorithms if 
the accuracy of the structure-factor calculation is of concern. 



2.6. Specific tools for refinement at low resolution 

At low resolution (~3.5 A and worse), the electron-density 
map often provides little atomic detail and the traditional set 
of local restraints (bonds, angles, planarities, chiralities, dihe- 
drals and nonbonded interactions) are insufficient to maintain 
known higher order structural organization (secondary 
structure) as well as other local geometry characteristics that 
are not directly restrained during refinement against higher 
resolution data (for example, peptide ip and i/f angles). At 
these low resolutions it is essential to include more a priori or 
external information in order to assure the overall correctness 
of the model. This information can be expressed through 
restraints to a known similar higher resolution (or homo- 
lologous) 'reference' structure (if available), to known 
secondary-structure elements or to target peptide cp and i/r 
angles in the Ramachandran plot. All these tools have recently 
been implemented in phenix.refine and details are discussed in 
this issue (Headd et al, 2012). 
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Figure 4 

Polygon images (Urzhumtseva et al., 2009) before (left) and after (right) re-refinement in 
phenix.refine for structures leic, lg2y, 2elg and 2ppn. In all cases the polygon computed for 
structures before re-refinement in phenix.refine indicates one or more problems, for example high 
i? (ree and i? work and too small bond r.m.s.d. for leic or very high R factors and geometry deviations 
for 2elg (vertices are on the furthermost end of the histogram bar). Re-refinement in phenix.refine 
resulted in polygon vertices moved towards the center (squeezing the polygon) in most cases, 
indicating improvement of the corresponding model characteristics. 



Given low-resolution data, if there 
are several copies of a molecule in the 
asymmetric unit one can assume that 
these copies are essentially similar and 
therefore noncrystallographic symmetry 
(NCS) restraints can be applied to 
coordinates and ADPs (Hendrickson, 
1985). This improves the data-to-para- 
meter ratio at low resolution and 
therefore reduces the risk of overfitting 
(DeLaBarre & Brunger, 2006; for a 
practical example, see Braig et al, 1995; 
it has been noted that nearly half of the 
low-resolution structures in the wwPDB 
contain NCS copies; see, for example, 
Kleywegt & Jones, 1995; Kleywegt, 
1996). 

In phenix.refine the coordinates and 
ADPs of NCS copies are harmonically 
restrained to the positions and ADPs of 
an average structure that is obtained by 
superposition and averaging of the NCS 
copies (Hendrickson, 1985). The NCS 
restraint term is added as an additional 
harmonic function to the geometry or 
ADP restraints terms. In ADP refine- 
ment the NCS restraints are only 
applied to U local (Winn et al, 2001; 
Afonine, Urzhumtsev et al, 2010). 
Selections for NCS groups can either 
be provided by the user or they can 
be determined automatically. Currently, 
phenix.refine uses a simple algorithm 
for automatic NCS detection which is 
based on sequence alignment of the 
chains provided in the input PDB file. 
The automatically generated NCS 
groups should therefore be considered 
as a guide in generating a complete set 
of NCS restraints rather than as a best 
final answer. 

If insufficient care is taken in defining 
the NCS groups, the above method 
may be counterproductive (Kleywegt & 
Jones, 1995; Kleywegt, 1996, 1999, 2001; 
Uson et al, 1999). It is important not 
to use NCS restraints for truly variable 
fragments that are different between 
the NCS copies (certain side chains, 
flexible loops etc.), otherwise they will 
be forced to match the average struc- 
ture, producing various local artifacts. 
An alternative approach restraining 
local interatomic distances has been 
published by Uson et al. (1999) and is 
used in SHELXL (Sheldrick, 2008). A 
similar approach using NCS restraints 
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parameterized in torsion-angle space is available in 
phenix.refine. 



2.7. GUI 

The graphical interface for phenix.refine retains most of the 
functionality of the command-line program, with the same 
parameter template used to draw controls in the GUI (in 
many cases automatically). However, the arrangement and 





Figure 5 

Selected examples of (2mF obs — DF mod! , b <p mo a c i) nuclear density map improvement after re- 
refinement of structure lc57 (neutron data). Left, original structure; right, after re-refinement in 
phenix.refine. Maps are contoured at the 1.5a level. Note the improved orientation of exchangeable 
H/D atoms at Ser and Tyr O atoms. The systematic lack of density around H atoms is a consequence 
of the negative scattering length of H atoms and related density-cancellation effects (Afonine, 
Mustyakimov et al, 2010). 



visibility of the controls have been tailored to minimize 
confusion for novice users, with only the most commonly used 
options displayed in the main window (Fig. 3a). In the 
windows for individual protocols, advanced options are hidden 
by default, but may be toggled by a 'user-level' control. 
Several extensions in the GUI provide additional automation 
via links to other programs such as phenix.ready _set, 
phenix.simple_ncs_from _pdb, phenix.find_tls_groups and 
phenix.xtriage, all of which may be run interactively to 
generate parameters that are incorpo- 
rated into the phenix.refine inputs. For 
parameters that define atom selections, 
a built-in graphical viewer allows 
dynamic visualization and modification 
of the selection. During and after 
refinement, progress is presented 
graphically as a plot showing the current 
R factors and geometry after each step. 
The final results (Fig. 3b) include 
buttons to load the refined model and 
electron-density maps in Coot or 
PyMOL (DeLano, 2002). A compre- 
hensive suite of validation tools largely 
derived from MolProbity (Davis et al, 
2007; Chen et al. , 2010) is run as the final 
step of refinement and these analyses 
are integrated into the display of results. 



3. Selected examples 

In this section, we illustrate the appli- 
cation of phenix.refine to a broad range 
of refinement cases (Table 1). Standard 
protocols were used as dictated by the 
resolution of the diffraction data and 
the model characteristics. The refine- 
ment protocols were not manually 
optimized to produce the lowest free R 
factors. 

3.1. Low-resolution structures 

The structures with PDB entries ljl4 
(Wang et al, 2001), 2gsz (Satyshur et al, 
2007), lyi5 (Bourne et al, 2005), 2wjx 
(Clayton et al, 2009), 3eob (Li et al, 
2009), lavl (Brouillette & Ananthara- 
maiah, 1995), 3bbw (Qiu et al, 2008) 
and 2i07 (Janssen et al, 2006) were 
selected because their published R 
factors are much higher than expected 
(Urzhumtseva et al, 2009). We were 
interested to test whether it was 
possible to improve their refinement 
using phenix.refine in a straightforward 
fashion. Since all of these structures are 
reported at low resolution (4 A or 
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lower) the phenix.refine refinement included NCS (where 
available), secondary-structure and Ramachandran plot 
restraints for refinement of coordinates and a restrained 
isotropic model for the refinement of atomic displacement 
parameters. A bulk-solvent mask optimization was also 
performed (Brunger, 2007; DeLaBarre & Brunger, 2006). In 
all cases the R factors (both free and work) were reduced 
significantly and in two of them overlooked twinning was a 
likely cause of the unusually high published R factors. For 
structure 3bbw twinning was detected by phenix.xtriage and 
the corresponding twin operator was used in refinement. 



and H/D exchangeable atoms using phenix. read y_set followed 
by refinement in phenix.refine yielded significantly improved 
i? work and i? free factors of 20.4% and 25.7%, respectively 
(Table 1). The overall map improvement is also clear (Fig. 5a). 
A number of rotatable H/D sites were reoriented into 
improved nuclear density by local real-space optimization 
(Figs. 5b and 5c). As another example, the availability of 
subatomic resolution data (0.65 A) for the ur0013 structure 
(Guillot et al, 2001) allowed partially unrestrained positional 
and all-atom anisotropic ADP refinement (including H 
atoms). 



3.2. Impact of ADP refinement 

The re-refinement of a synaptotagmin structure at 3.2 A 
resolution (PDB entry ldqv; Sutton et al, 1999) emphasizes 
the importance of using a TLS parameterization not only as a 
way to reduce the number of refined parameters but more 
importantly to provide a more reasonable model for global 
domain motions (Urzhumtsev et al, 2011). Restrained 
refinement of individual ADPs in phenix.refine reduces the 
published R WOTk /R bee from 29.3/34.8% to 25.5/29.3%. Further 
combined refinement of TLS parameters 
and individual ADPs reduced R WO rk/Rfiee to 
22.5/25.5%. 

3.3. High-resolution refinement 

Given the relatively high resolution of 
1.4 A, the structure leic (Chatani et al, 
2002) has surprisingly high values of i? (ree 
and i? W ork, as well as unusually small bond 
and angle deviations from ideal values 
(Fig. 4). Re-refinement with all anisotropic 
ADPs, automatic water update, target- 
weight optimization and added riding H 
atoms significantly improved these statistics. 
Other structures, 2elg (Ohishi et al, 2007), 
lg2y (Rose et al, 2000) and 2ppn (Szep et al, 
2009), were also selected on the basis of 
unusually high R factors. Re-refining the 
models with added riding H atoms, aniso- 
tropic ADPs for all atoms except H atoms 
and automated water update resulted in a 
significant improvement in R factor and 
other statistics as illustrated by polygon 
images (Urzhumtseva et al, 2009; Fig. 4). 



3.4. Refinement against neutron data at 
medium and ultrahigh resolution 

The structure lc57 (Habash et al, 2000) 
was obtained from a partially deuterated 
sample at 2.4 A resolution. However, the 
PDB model does not contain any D atoms, 
resulting in the recalculated i? work of 30.0% 
and R bee of 33.9% being higher than the 
published values (27.0% and 30.1%, 
respectively). Automated rebuilding of H 



3.5. Combined real- and reciprocal-space refinement (dual- 
space refinement) 

To illustrate the power of the dual-space refinement 
protocol implemented in phenix.refine, we selected a structure 
from the PDB (PDB entry ltxj; Vedadi et al, 2007) and moved 
atoms in a such a way that the amount of introduced distortion 
is likely to put it beyond the convergence radius of traditional 
reciprocal-space minimization-based refinement. The model 
distortions included (i) switching to a different rotamer for 





(c) (d) 

Figure 6 

Structures after refinement of a severely distorted model, shown in (a), using different 
refinement protocols: (b) dual-space refinement, (c) refinement using minimization only and 
(d) combined refinement using minimization and simulated annealing. The best available 
refined model is shown in gray in all panels. 
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each residue side chain; (ii) randomly moving (shaking using 
phenix.pdbtools) all coordinates with an r.m.s. coordinate shift 
of 1 A followed by geometry regularization (also using 
phenix.pdbtools); (iii) removing all solvent and (iv) resetting 
all ADPs to the average value computed across all atoms. This 
resulted in an overall coordinate distortion r.m.s.d. of about 
2.1 A (Fig. 6a) and an increase of the best available i? wor k/^free 
from 18.7/21.2% to 53.2/54.4%. Subsequently, we performed 
three independent refinement runs, each starting from the 
same distorted model. All refinements included ten macro- 
cycles of coordinate and isotropic ADP refinement combined 
with ordered solvent (water) updates. Coordinates in the first 



refinement were refined using L-BFGS minimization only. The 
second refinement included L-BFGS minimization and 
Cartesian simulated annealing performed during the first five 
macro-cycles. Finally, the third refinement was similar to the 
second one but included overall real-space refinement and 
local torsion-angle grid-search real-space correction of resi- 
dues to best fit the density map and match the closest plausible 
rotameric state. The i? wor k and i? tree after the three refinements 
were 46.1/52.2%, 41.5/48.8% and 20.8/23.7%, respectively. The 
refined models are shown in Figs. 6(b), 6(c) and 6(d). Clearly, 
the new dual-space refinement protocol was able to bring the 
distorted model back close to the best available refined model, 
while both simple minimization and combined minimization 
and simulated annealing failed to do so. 
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Figure 7 

(a) Ensemble of structures illustrating the outcome of 100 identical 
simulated-annealing refinement runs apart from the random seed, (b) 
The distribution of i? work and i? free corresponding to each structure of the 
ensemble. 



3.6. Including H atoms in refinement 

To illustrate the contribution of H atoms to refinement, we 
selected a structure from the PDB (PDB entry 3aci; Tsuki- 
moto et al, 2010) which was refined at 1.6 A resolution to 
-Rwork = 14-1 and R bee = 18.8%. This structure was then refined 
with and without H atoms. Both refinement runs included 
three macro-cycles of positional and isotropic ADP refine- 
ment, automated water update and X-ray/restraints target- 
weight optimization. The refinement without H atoms yielded 
-^work = 14.6 and R bee = 18.3%. Refinement with H atoms 
resulted in i? wolk = 13.7 and R Sree = 16.5%. We suggest that it 
is prudent to preserve the H atoms in the final model (and to 
record them in the PDB deposition file), as omitting them 
increases the i? wor k and _R free to 15.1% and 17.8%, respectively. 



4. Remark regarding uncertainties in refinement results 

Given that the landscape of a macromolecular crystallography 
refinement target is very complex and the convergence radii of 
refinement protocols are generally very small in comparison, 
the outcome of a refinement run may strongly depend on the 
initial model and algorithmic parameters in ways that at first 
sight may not seem important. To illustrate this, we performed 
100 identical SA refinement runs for a structure at 2 A reso- 
lution, where the only difference between each refinement run 
was the random seed used to assign initial random velocities. 
The result is an ensemble of structures that are all similar in 
general but slightly different in detail (Fig. la). The variation 
of structures within the ensemble reflects two phenomena: 
refinement artifacts (limited convergence radius and speed) 
and (probably to a lesser degree) structural variability 
(Terwilliger et al, 2007). The spread of the ensemble broadens 
as the upper resolution limit becomes worse. The R factors 
also deviate further (Fig. lb). This variation is always impor- 
tant to keep in mind when comparing refinement results 
(R factors, for example) obtained with different refinement 
strategies or slightly different starting models. 
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5. Conclusions 

phenix.refine provides a comprehensive set of tools for 
refinement across a broad range of resolution limits (sub- 
atomic to low) using X-ray, neutron or both types of data 
simultaneously. A high degree of automation and robustness 
allows a range of refinement strategies to be used from a 
nearly 'black box'-like default mode to the option of 
customizing more than 500 control parameters. All standard 
tools available for refinement using X-ray data are also 
available for refinement using neutron data. Any combination 
of available refinement strategies can be applied to any 
selected part of the structure. The GUI makes phenix.refine 
easy to use for both novice and experienced crystallographers. 

The most recent developments include new or improved 
tools for refinement against low-resolution data (~3.5 A and 
lower), such as reference-model, secondary-structure and 
Ramachandran plot restraints, the latter being recommended 
in only the most challenging of circumstances such as very 
low resolution. NCS restraints parameterized in torsion-angle 
space will eliminate the need for subjective and often tedious 
selection of NCS groups. An improved target-weight optimi- 
zation protocol is designed not only to yield a refined model 
with the best R tlee but also to maintain the i? fr ee-^work g a P and 
model geometry within expected limits. A fast TLS group- 
determination algorithm allows fully automated assignment of 
TLS groups as part of the refinement run. Our initial results 
incorporating real-space methods into the refinement protocol 
(dual-space refinement) show a significant increase in the 
convergence radius of refinement that is not typically 
achievable using only reciprocal-space methods. 

Future development plans include further improvements of 
the tools for low-resolution refinement, the expanded use 
of real-space methods for fast local model completion and 
rebuilding, the implementation of twinning-specific maximum- 
likelihood targets, methods for refinement of very incomplete 
atomic models, better modeling of local structural anisotropy 
and improving the bulk-solvent model to account for hydro- 
phobic cores and alternative conformations. More automated 
decision-making will also be implemented for determining the 
optimal model parameterization and refinement strategy for 
different situations. 

Finally, others have shown (Joosten et ah, 2009) that it is 
possible to apply modern refinement and model-rebuilding 
algorithms to improve structures deposited in major public 
databases such as the PDB. A number of examples in this 
manuscript illustrate that the application of methods in the 
phenix.refine program can potentially extend these improve- 
ments and lead to even better models. 

The PHENIX software is available at http://www. 
phenix-online.org free of charge for academic users and 
through a consortium for commercial users. 

The authors would like to thank the NIH (grant GM063210 
and its ARRA supplement) and the Phenix Industrial 
Consortium for support of the PHENIX project. This work 
was supported in part by the US Department of Energy under 



Contract No. DE-AC02-05CH11231. We are grateful to all 
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